An Eecient Method to Estimate Bagging's Generalization Error

نویسندگان

DAVID H. WOLPERT

WILLIAM G. MACREADY

چکیده

Bagging [1] is a technique that tries to improve a learning algorithm's performance by using bootstrap replicates of the training set [5, 4]. The computational requirements for estimating the resultant generalization error on a test set by means of cross-validation are often prohibitive for leave-one-out cross-validation one needs to train the underlying algorithm on the order of m times, where m is the size of the training set and is the number of replicates. This paper presents several techniques for estimating the generalization error of a bagged learning algorithm without invoking yet more training of the underlying learning algorithm (beyond that of the bagging itself), as is required by cross-validation-based estimation. These techniques all exploit the bias-variance decomposition [6, 10]. The best of our estimators also exploits stacking [8]. In a set of experiments reported here, it was found to be more accurate than both the alternative cross-validation-based estimator of the bagged algorithm's error and the cross-validation-based estimator of the underlying algorithm's error. This improvement was particularly pronounced for small test sets. This suggests a novel justi cation for using bagging| more accurate estimation of the generalization error than is possible without bagging.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimating the Generalization Performance of an Svm Eeciently

This paper proposes and analyzes an eecient and eeective approach for estimating the generalization performance of a support vector machine (SVM) for text classiication. Without any computation-intensive resampling, the new estimators are computationally much more eecient than cross-validation or bootstrapping. They can be computed at essentially no extra cost immediately after training a singl...

متن کامل

Expected Error Analysis for Model Selection

In order to select a good hypothesis language (or model) from a collection of possible models, one has to assess the generalization performance of the hypothesis which is returned by a learner that is bound to use some particular model. This paper deals with a new and very eecient way of assessing this generalization performance. We present a new analysis which characterizes the expected genera...

متن کامل

An inequality related to $eta$-convex functions (II)

Using the notion of eta-convex functions as generalization of convex functions, we estimate the difference between the middle and right terms in Hermite-Hadamard-Fejer inequality for differentiable mappings. Also as an application we give an error estimate for midpoint formula.

متن کامل

Bounding the Generalization Error of Convex Combinations of Classiiers: Balancing the Dimensionality and the Margins

A problem of bounding the generalization error of a classiier f 2 conv(H); where H is a "base" class of functions (classiiers), is considered. This problem frequently occurs in computer learning, where eecient algorithms of combining simple classiiers into a complex one (such as boosting and bagging) have attracted a lot of attention. Using Talagrand's concentration inequalities for empirical p...

متن کامل

Estimating Generalization Error Using Out-of-Bag Estimates

We provide a method for estimating the generalization error of a bag using out-of-bag estimates. In bagging, each predictor (single hypothesis) is learned from a bootstrap sample of the training examples; the output of a bag (a set of predictors) on an example is determined by voting. The outof-bag estimate is based on recording the votes of each predictor on those training examples omitted fro...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1996

An Eecient Method to Estimate Bagging's Generalization Error

نویسندگان

چکیده

منابع مشابه

Estimating the Generalization Performance of an Svm Eeciently

Expected Error Analysis for Model Selection

An inequality related to $eta$-convex functions (II)

Bounding the Generalization Error of Convex Combinations of Classiiers: Balancing the Dimensionality and the Margins

Estimating Generalization Error Using Out-of-Bag Estimates

عنوان ژورنال:

اشتراک گذاری